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Specific choices about how to represent complex networks can have a substantial effect on the 
execution time required for the respective construction and analysis of those structures. In this 
work we report a comparison of the effects of representing complex networks statically as matrices 
or dynamically as spase structures. Three theoretical models of complex networks are considered: 
two types of Erdos-Renyi as well as the Barabasi- Albert model. We investigated the effect of the 
different representations with respect to the construction and measurement of several topological 
properties (i.e. degree, clustering coefficient, shortest path length, and betweeness centrality). We 
found that different forms of representation generally have a substantial effect on the execution time, 
with the sparse representation frequently resulting in remarkably superior performance. 
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I. INTRODUCTION 

As a consequence of the intrinsic difficulties in achiev- 
ing analytical approaches for the characterization and 
modelling of natural systems, a great deal of such investi- 
gations has to rely on computational methods. The typi- 
cal case involves the application of numerical methods in 
order to solve differential equations (e.g. [1, 2]), which is 
the most frequent situation found in practice in Physics. 
Frequently, such problems involve large amounts of data, 
as well as data which are very large. Given the impor- 
tance of effectively tackling these problems, a lot of at- 
tention and efforts have been invested in developing, im- 
plementing and applying numerical methods which are 
fast and accurate (e.g. [lj). Indeed, such efforts give rise 
to the important area of Computational Physics. 

A peculiar situation in computational physics is found 
in complex networks research [3-5], a new multidisci- 
plinary area of physics which has undergone an impres- 
sive development along the last decade. Here, the inves- 
tigations rely not mainly on numerical solution of differ- 
ential equations, but on intensive handling of matrices as 
well as combinatorial or spectral methods as required for 
calculation of measurements [f| such as shortest paths, 
betweeness centrality, and spectra of graphs. Though 
presenting such a distinctive nature, computational ap- 
proaches to complex networks also aim at achieving pre- 
cision and speed. The latter demand often becomes par- 
ticularly critical as a consequence of the large size of sev- 
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and social in- 



Internet |7|] .protein-protein interaction 
teractions Q, to name but a few cases. 

Indeed, the effective approach to most of the remain- 
ing challenges in complex networks research is immedi- 
ately related to the ability to effectively represent and 
process large structures. This can be done in the two 
following ways: (i) development of more effective algo- 
rithms; and (ii) careful and efficient respective imple- 
mentation of those algorithms. While much attention 



has been placed recently on (i), the final performance 
will ultimately depend critically on the implementation, 
making step (ii) particularly critical for achieving good 
results. The current work focuses on important practical 
implementational aspects related to the use of sparse or 
full representation of graphs. As such the present article 
constitutes one of the few works investigating the effect 
of such important practical choices on the resulting effi- 
ciency of the implementation of a set of crucially impor- 
tant operations typically performed in complex networks 
research, including network generation as well as the es- 
timation of important topological properties such as the 
degree, clustering coefficient, shortest path length, and 
betweeness centrality. 

This article starts by describing the computational 
tasks to be performed, namely the estimation of several 
topological features of the networks, and follows by pre- 
senting the adopted network models and the two types of 
representations of networks to be compared. The work 
concludes by presenting and disucussing the computa- 
tional efficiency of these two representations as obtained 
through computational simulations. 



II. THE METHODS CHOSEN FOR THE 
EVALUATION 

It is henceforth assumed that all networks are undi- 
rected and unweighted. Full representations of the net- 
works are performed in terms of the respective adja- 
cency matrices K, such that the presence of an edge be- 
tween nodes i and j imply K(i,j) = K(j,i) — 1, with 
K{i,j) = K(j,i) = being otherwise imposed. The 
total number of nodes and edges in the networks are re- 
spectively abbreviated as N and E. A set of four rep- 
resentative methods/measurements of complex networks 
have been selected in order to investigate the effect of 
implementational parameters and choices on the respec- 
tive performance: degree, clustering coefficient, shortest 
path and betweeness centrality. Each of these methods 



are briefly revised in the following. 

Degree: The degree of a node i corresponds to the 
number of links attached to it. It can be calculated by 
adding all entries in column i of the adjacency matrix. 
The degree is an intrinsically local measurement, in the 
sense of taking into account only the links directly at- 
tached to the node. Usually, the degree is calculated for 
all the nodes of a given network. 

Clustering Coefficient: The clustering coefficient is 
also a local measurement, specific to each node i. How- 
ever, it also consider the interconnectivity between the 
neighbors of that node. In the case of full representation 
in terms of the adjacency matrix, the calculation of this 
measurements requires access to all the columns corre- 
sponding to each of the neighbors of node i. 

Shortest Path Identification: Given two nodes i and j, 
the shortest topological path between them corresponds 
to the path which has the smallest number of edges. Note 
that it is possible to have two or more distinct shortest 
paths of the same size. 

Betweeness Centrality: The betweeness centrality is a 
property associated to a given node or edge. In both 
cases, it refers to the number of shortest paths, consider- 
ing all pairs of nodes in the networks, which pass through 
the given node or edge. The calculation of the betwee- 
ness centrality requires the determination of the shortest 
paths for every pair of distinct nodes. 



III. NETWORK MODELS 



IV. FULL OR SPARSE REPRESENTATION OF 
THE NETWORKS 

Two main representations were used: adjacency ma- 
trix and adjacency lists [11| . Adjacency matrix is a dense 
representation, in the sense that all possible edges in the 
network are explicitly included, with a value used to in- 
dicate the presence of each edge, and another value used 
otherwise. Adjacency lists are sparse, as only the edges 
present in the network are incorporated. The adjacency 
matrix is usually implemented as a static structures, like 
basic arrays used in most computer languages. On other 
hand, adjacency lists are implemented as dynamic struc- 
tures and require pointers (a memory position pointing 
to another one). 

For the elements of the adjacency matrix, we con- 
sider five possibilities, depending on the C language data 
type used for each element: double precision (double, 
64 bits) and single precision (float, 32 bits) floating 
point number, integer numbers (int, 32 bits), boolean 
values (which can assume only true and false values, 
8 bits) and bits. This last element representations does 
not have a corresponding type in C, and was implemented 
using an int value to store 32 elements of the matrix, 
with bit manipulation operations used to access the in- 
dividual bit values. 

In the adjacency lists representation, a list is main- 
tained for each vertex, with the numbers of the vertices 
that are neighbors to it. This representation uses an 
integer value for each neighbor and an overhead for list 
administration. Nevertheless, it spares memory space for 
sparse networks. 



In the following we use three network models. Two 
models due to Erdos and Renyi (ER) and the scale free 
model of Barabasi and Albert [10] (BA). The first model, 
denoted ER (probability) , connects each pair of vertices 
with a fixed probability p. The average degree in this 
model is p(N — 1), where N is the number of vertices 
in the network. The second model, denoted ER (edges), 
uses a fixed number E of edges, and connects each edge 
to a randomly chosen pair of nodes. The average de- 
gree of the network is 2E/N. These two models have 
similar statistical properties, but are included here due 
to their different behavior during network construction: 
as the first model must consider all pairs of nodes, it is 
computationally intensive for large networks; the second 
model has construction time proportional to the number 
of edges, and is therefore faster for sparse networks. 

The Barabasi-Albert networks are constructed start- 
ing with a small number of vertices, and adding vertices 
one by one, each new vertex being connected to m exist- 
ing vertices, chosen using a linear preferential attachment 
rule, vertices with higher degrees having higher probabil- 
ities of being chosen. The resulting average degree is 
given by 2m. 



V. RESULTS AND DISCUSSION 

We study the execution time needed for the generation 
of the network and for the computation of the following 
network measurements: average degree, clustering coeffi- 
cient, all-pairs distances and betweenness centrality. The 
effect of the various network representations is evaluated 
as a function of the network size and average degree. 



A. Network generation times for different network 

sizes 

We first consider the effect of network size on the ex- 
ecution time needed for the generation of the networks, 
using different network representations. 



1. ER (probability) model 

We start by investigating how the generation of ER 
networks is affected by the choice of different graph rep- 
resentations. The execution times required to produce 
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2. ER (edges) model 

Figure [TJb) depicts the execution times obtained for 
generation of ER networks with edges for different net- 
works sizes and average degree equal to 10. 

Unlike the results obtained previously, now the adop- 
tion of different types of graph representations has a 
marked effect on the respective execution times. In par- 
ticular, the improvements allowed by the more memory- 
effective representations (bit and list) are now evident. 
Interestingly, a sharp change of execution times in the 
matrix cases is observed at about N = 1000. This 
abrupt increases occurs when the capacity of the cache of 
the microcomputer is exceeded by larger sizes of graphs. 
Though the list representation is initially slower than the 
matrix cases, it becomes faster and faster with the in- 
crease of N. 

The substantial differences now observed between the 
execution times obtained for the diverse representations 
are a consequence of the fact that the smaller time re- 
quired for the choice of the pairs to be connected implies 
that the intrinsic access time for each type of represen- 
tation becomes more pronounced. 



3. BA model 

The generation times obtained for BA networks with 
average degree 10 are shown in Figure HJc). As with ER, 
the list and bit representations provide the fastest execu- 
tion times for large values of N . Along the region where 
the cache is large enough to cope with the graph size, 
the bits representation is the fastest option, but with the 
increase of N the list implementation becomes progres- 
sively more effective. This is a consequence of the fact 
that the computational cost with a list representation is 
linear with N, while cost with the matrix representation 
increases with N 2 . As before, the matrix-based imple- 
mentations tend to be slower. 



(c) 



Figure 1: Time taken to generate networks of different sizes 
(number of nodes) for the Erdos-Renyi model with fixed prob- 
ability (a), the Erdos-Renyi model with fixed number of edges 
(b) and the Barabasi- Albert model (c), using various graph 
representations. 



B. Computation time of some measurements for 
different network sizes 



We turn now our attention to the effect of network size 
on the computation time of some network measurements, 
for the various network representations. 



ER networks of several sizes, and average degree 10, are 
shown in Figure [TJa). 

Generally speaking, the different types of graph repre- 
sentation clearly had little effect on the execution time. 

The reason why the execution times resulted similar is 
that most of the computational effort is invested in con- 
sidering all pairs of nodes to be connected with constant 
probability. 



1. Average 



We now turn our attention to the effect of different 
graph representations on the execution time required for 
the calculation of some of the principal measurements of 
the topology of the graphs. We start by investigating the 
execution times required for determination of the average 
degree. Figure HJa) depicts the obtained results for BA 



networks with varying sizes and average degree equal to 
10. 

It is clear that the list implementation allows a dra- 
matic reduction of the execution times for most network 
sizes. The other implementations required similar execu- 
tion times and, as could be expected, the double repre- 
sentation implied the longest execution times. 



2. Clustering coefficient 



to 10). Now, we proceed to investigate how the speed 
is influenced by different values of average degree. This 
will allow us to get insights about the generality of the 
previously observed trends. In principle, it could be ex- 
pected that the larger the average degree of a network, 
the smaller would be the benefits provided by the lists, 
because the matrices would become less sparse. There- 
fore, special attention is henceforth focused on this po- 
tential effect. 



Figure [5Jb) shows the execution times obtained for the 
calculation of the clustering coefficient of BA networks 
of several sizes and average degree 10. As a consequence 
of the fact that this measurement demands more com- 
putations than the average degree, the execution times 
resulted larger than those in Figure [2ja) . Interestingly, 
the several tested representations led to similar execution 
times, with the list and bool implementations providing 
particularly good efficiency for large values of N. 



3. All-pairs distances 

Figure^c) presents the execution times obtained while 
calculating the average shortest distance lengths for sev- 
eral BA networks with average degree equal to 10. Simi- 
larly to the betweeness centrality, this measurement also 
requires intensive computations. 

The results are similar to those obtained for the be- 
tweeness, but the relative improvement allowed by the 
lists implementation is still larger now. 



4- Betweenness centrality 

We also investigated how the time required for the cal- 
culation of the betweeness centrality varied with the sev- 
eral adopted implementations. Figure [U[d) shows the 
obtained results for BA networks of several sizes and av- 
erage degree equal to 10. 

The substantially more complex nature of this mea- 
surement has been clearly reflected in the larger execu- 
tion times. While little differences can be noticed for 
most implementations, the list representation allowed, 
again, substantially faster execution times, representing 
the fast option for all values of N . Indeed, the relative 
improvement obtained with lists clearly seems to increase 
with the network size. This implies that the use of lists 
becomes critical for allowing calculation of betweeness in 
particularly large networks. 



C. Network generation times for different average 
degrees 

So far we have probed how the execution time varies 
with the network size for a fixed average degree (equal 



1. Network generation time for the BA model 

Figure [3] shows the execution times, in terms of the av- 
erage degree, obtained for generating BA networks while 
using the several representations. The network size is 
henceforth fixed at N = 10000. 

The results are evident and confirm that the use of 
lists guarantees higher speed up to about average degree 
100, decreasing steeply thereafter. Particularly interest- 
ing is the behavior of the bits implementation, which 
overtakes the lists from average degree 20. This fact 
suggests that the execution time seems to be strongly 
affected by the memory which is demanded by each im- 
plementation. With the increase of the average degree, 
the matrix implementations become progressively more 
effective, while the bits, and particularly the list, imple- 
mentations loose their effectiveness. 



D. Computation time of some measurements for 
different average degrees 

Now we consider the effect of the average degree in the 
computation of some network measurements for different 
graph representations. 



1. Average degree 

Figure |D(a) depicts the execution time, in terms of the 
average degree, required to calculate the average degrees 
of BA networks with size N = 10000. 

While the speed of the matrix implementations do not 
depend on the average degree of the original network, the 
relative efficiency of the list representation is dramatic for 
average degree values up to about 80, decreasing progres- 
sively for larger values, until becoming rather ineffective. 
With the matrix representation, the computation of the 
average degree involves adding along the rows, which has 
a constant cost, therefore becoming independent of the 
network average degree. On the contrary, in the case of 
list representation the average degree is calculated over 
a varying number of elements which grow linearly with 
the network degree. 
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Figure 2: Computation time for some network measurements [average degree (a), clutering coefficient (b), average distances 
(c) and node betweenness centrality (d)] as a function of network size for BA networks of average degree 10. 
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Figure 3: Network generation time (BA model with N 
10000) as a function of average degree. 



ure Ufb) , in terms of several average degrees of BA net- 
works. 

The general trends verified in this graph are similar 
to those obtained for the networks generation and aver- 
age degree calculation. However, the relative advantage 
of the list implementation is less marked, becoming less 
effective for networks with average degrees larger than 
10. 

The calculation of the clustering coefficient requires the 
identification of the links between the immediate neigh- 
bors of the reference node. In other words, it is necessary 
to check the existence of a link between each pair of nodes 
i and j connected to the reference node. In the case of the 
matrix representation, this can be done easily by check- 
ing the position (i,j) in the adjacency matrix. However, 
in the case of the list representation, this requires going 
through the whole list of nodes that are adjacent to node 
i while searching for node j. 



2. Clustering coefficient 

The dependency of the execution times required for 
calculation of the clustering coefficient is shown in Fig- 



E. All-pairs distances 

The estimation of the shortest distances in terms of the 
average degree is shown in Figure HJc). The relationship 
between the times required for the calculation of these 
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Figure 4: Computation time for some network measurements [average degree (a), clutering coefficient (b), average distances 
(c) and node betweenness centrality (d)] as a function of average degree for BA networks of size N = 10000. 



measurements is similar to the three previous cases, with 
the difference that the critical average degree for which 
the dynamic representation is no longer the gfastest op- 
tion is nos between 50 and 60 for the BA model. This 
critical degree is certainly dependent of the size of the 
network. 



F. Betweenness centrality 

Regarding the relationship between the average degree 
and the betweeness centrality, shown in Figure |4jd) , a 
relationship similar to that obtained for the two previous 
cases has been observed. However, now we have a higher 
average degree for which the dynamic representation be- 
comes worse than the others. This degree is dependent 
of the size of the network, in the case of N = 4000, this 
critical average degree is 100. 



the choice of adequate network representation can have 
major impact on the overall execution time. More specif- 
ically, we compared full and sparse schemes for repre- 
senting the connectivity of the networks while generating 
networks and calculating several measurement of their 
topology. The sparse representation resulted generally 
more effective than the full scheme, with the exception 
of the cases when the networks have very large average 
degree. We also investigated the effect of having diverse 
data types such as byte, integer, float, double and bit. 
In general, the shorter data types led to superior perfor- 
mance as a consequence of the smaller amount of memory 
to be accessed. 

The obtained results and trends suggest a number of 
further investigations. For instance, it would be interest- 
ing to consider other network models and measurements, 
as well as to assess the effect of different types of hard- 
ware, compilers and operating systems. 



VI. CONCLUDING REMARKS 
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